Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee
نویسندگان
چکیده
Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. It is probably commonly believed that the best one can hope in general from such an approach is to get a local optimum of this criterion. In this article, we show the following surprising result: any (approximate) local optimum enjoys a global performance guarantee. We compare this guarantee with the one that is satisfied by Direct Policy Iteration, an approximate dynamic programming algorithm that does some form of Policy Search: if the approximation error of Local Policy Search may generally be bigger (because local search requires to consider a space of stochastic policies), we argue that the concentrability coefficient that appears in the performance bound is much nicer. Finally, we discuss several practical and theoretical consequences of our analysis.
منابع مشابه
Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search
Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. The best one can hope in general from such an approach is to get a local optimum of this criterion. The first contribution of this article is ...
متن کاملOn the Performance Bounds of some Policy Search Dynamic Programming Algorithms
We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on Policy Search algorithms, that compute an approximately optimal policy by following the standard Policy Iteration (PI) scheme via an -approximate greedy operator (Kakade and Langford, 2002; Lazaric et al., 2010). We describe existing and a few new performance bounds for Direc...
متن کاملModify the linear search formula in the BFGS method to achieve global convergence.
<span style="color: #333333; font-family: Calibri, sans-serif; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff; text-dec...
متن کاملGlobal Optimality of Local Search for Low Rank Matrix Recovery
We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrixrecovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to aglobal optimum. Together with a curvature bound at saddle points, this yields a polynomial time global convergenceguarantee for stochastic gradient descent ...
متن کاملConstrained Nonlinear Optimal Control via a Hybrid BA-SD
The non-convex behavior presented by nonlinear systems limits the application of classical optimization techniques to solve optimal control problems for these kinds of systems. This paper proposes a hybrid algorithm, namely BA-SD, by combining Bee algorithm (BA) with steepest descent (SD) method for numerically solving nonlinear optimal control (NOC) problems. The proposed algorithm includes th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1306.1520 شماره
صفحات -
تاریخ انتشار 2013